Method, Apparatus and Computer Program Product Providing Storage Network Dynamic Tuning of I/O Flow with Queue Depth

ABSTRACT

In accordance with a computer program product, apparatus and a method there is provided a redundant network wherein a host computer operates with a plurality of storage devices by monitoring conditions of the multipath storage network and controlling a storage multipath device driver in conjunction with an associated storage multipath device input/output (I/O) pending queue to increase I/O throughput to a storage device driver, such as a disk device driver, when I/O demand increases, and to decrease I/O throughput to the storage device driver in the event of an I/O error condition.

TECHNICAL FIELD

These teachings relate generally to data storage networks, systems andmethods and, more specifically, relate to data queue managementapparatus and methods that are useful in a storage area network (SAN)type architecture.

BACKGROUND

It is common practice for a data storage device, such as a disk-baseddata storage device, to have a recommended value on the number ofrequests that it can handle under good conditions and under errorconditions. This value is typically referred to as a Queue Depth, andcan be used by a disk device driver to control the input/output (I/O)flow to the storage device.

In a multipath configuration environment there are multiple paths toprovide I/O with the disk device driver. Since the number of I/O sentfrom the disk device driver to the storage device is still limited bythe Queue Depth value, there can be many jobs that are queued at a diskdevice driver pending queue. This can cause a problem during errorrecovery, as the disk device driver will typically retry all therequests on the queue for some number of times (e.g., five retries perqueued request). Since the pending queue could become much longer in amultipath configuration environment, this can result in a significantperformance degradation during error recovery, or in worst case, thesystem hanging resulting in an application timeout.

In order to address this problem, a storage multipath device driver canimplement Queue Depth control at its level to limit the amount of I/Osent to the disk device driver. This process can aid in solving theperformance degradation problem at the level of the disk device driverduring an error recovery procedure.

However, a further problem can then be introduced during normal(non-error) conditions with heavy or stress I/O at the storage multipathdevice driver level, especially with certain types of applications thatflood very heavy I/O to a small number of storage devices. Under thiscondition, a large number of jobs can be enqueued at a pending queue ofthe storage multipath device driver, which can result in severeperformance degradation and/or a system hanging event.

It can be appreciated that absent a Queue Depth limit at the storagemultipath device driver level, the disk device driver can become abottleneck in the error recovery situation. However, if the storagemultipath device driver uses Queue Depth to limit I/O flow, then storagemultipath device driver can become the bottleneck during normal(non-error) condition with stress I/O.

In US 2004/0194095 A1, “Quality of Service Controller and Method for aData Storage System”, Lumb et al. disclose that requests for each of aplurality of storage system workloads are prioritized. The requests areselectively forwarded to a storage device queue according to theirpriorities so as to maintain the device queue at a target queue depth.The target queue depth is adjusted in response to a latency value forthe requests, where the latency value is computed based on a differencebetween an arrival time and a completion time of the requests for eachworkload. Prioritizing the requests can be accomplished by computing atarget deadline for a request based on a monitored arrival time of therequest and a target latency for its workload. To reduce latencies, itis said that the target queue depth may be reduced when the targetlatency for a workload is less than its computed latency value, and toincrease throughput the target queue depth may be increased when thetarget latency for each workload is greater than each computed latencyvalue.

In U.S. Pat. No. 6,636,909 B1, “Adaptive Throttling for Fiber ChannelDisks”, Kahn et al. disclose a method that sends a write request to adisk and, in response to receiving a queue full signal from the disk ifthe disk queue is full, sets a throttle value. The method is said toseek to avoid triggering a queue full status for a storage device byqueueing commands that would overload the storage device in a localsoftware disk driver queue. Since a predefined limit on command issuanceis said to not be feasible, initiator devices instead must be able torecognize potential error producing situations and thereafter limit orthrottle the number of commands issued. Accordingly, a method operatesby sending a write request to a disk, receiving a queue full signal fromthe disk if the disk queue is full, and responsive to receiving thequeue full signal setting a throttle value and thereafter dynamicallyadjusting the throttle value to maintain the storage device in a steadystate.

In U.S. Pat. No. 6,170,042 B1, “Disc Drive Data Storage System andMethod for Dynamically Scheduling Queued Commands”, Gaertner et al.disclose a data storage system and method of scheduling commands inwhich commands are stored in a command sort queue and a scheduledcommand queue. Commands in the command sort queue are sorted andassigned a priority. Eventually, commands in the command sort queue aretransferred to the scheduled command queue, where commands in thescheduled command queue are executed without further sorting. Thedesired queue depth or size of the scheduled command queue is determinedas a function of both the queue depth of the command sort queue and acommand execution rate value indicative of the rate at which commands inthe scheduled command queue are executed. The desired queue depth can bedynamically determined using the queue depth of the command sort queueand the command execution rate value as inputs to a look-up table. Thedata storage system is said may include a small computer systeminterface (SCSI) disc (or “disk”) drive that executes commands from ahost system. These various U.S. Patents and the U.S. Patent Publicationdo not address the specific problems discussed above, and thus do notprovide a solution for these problems.

SUMMARY OF THE PREFERRED EMBODIMENTS

The foregoing and other problems are overcome, and other advantages arerealized, in accordance with the presently preferred embodiments ofthese teachings.

In accordance with a computer program product, apparatus and a methodthere is provided a redundant network wherein a host computer operateswith at least one storage device by monitoring conditions of themultipath storage network and controlling a storage multipath devicedriver in conjunction with an associated storage multipath deviceinput/output (I/O) pending queue to increase I/O throughput to a storagedevice driver, such as a disk device driver, when I/O demand increases,and to decrease I/O throughput to the storage device driver in the eventof an I/O error condition.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other aspects of these teachings are made more evidentin the following Detailed Description of the Preferred Embodiments, whenread in conjunction with the attached Drawing Figures, wherein:

FIG. 1 is a block diagram of a Storage Area Network (SAN) system that issuitable for practicing this invention; and

FIG. 2 is a logic flow diagram that illustrates a method in accordancewith exemplary embodiments of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 is a block diagram of a Storage Area Network (SAN) system 10 thatis suitable for practicing this invention. The SAN system 10 includes ahost 12 coupled to at least one storage device, such as a disk drive 16,via at least one bus, also referred to herein as a cable or data path14. The host 12 may be a computer, such as a mainframe computer, aworkstation, a personal computer, or any type of computing device. Thedisk drive 16 may in practice comprise a plurality of disk drives, suchas an array of disk drives 15, and may be embodied as a Redundant Arrayof Inexpensive Disks (RAID). There maybe a disk drive controller 19having overall responsibility for storing data in and reading data fromthe disk drives 16. The path 14 may be a Fiber Channel (FC) type bus,and may implement a SCSI-type of interface. The exact nature of the path14 and/or the specifics of the path protocol are not germane to thepractice of the exemplary embodiments of this invention.

For the purposes of describing the exemplary embodiments of thisinvention the host 12 is assumed to include a storage multipath devicedriver (SMDD) 12A that operates with a storage multipath device I/Opending queue 12B. The host 12 is further assumed to include a diskdevice driver (DDD) 18A that operates with a disk device driver I/Opending queue 18B.

One or both of the storage multipath device driver 12A and disk devicedriver 18A can be implemented using a data processor that executes astored software program, or with hardware logic elements, or with acombination of software programs and hardware logic elements. The I/Opending queues 12B and 18B may be implemented using read/write memory ofany suitable type, such as semiconductor random access memory (RAM).

The host 12 may be assumed to include or be coupled to at least oneapplication (APP), and more typically a plurality of applications (APP1,APP2, . . . , APPn), at least some of which perform disk-based I/O viathe storage multipath device driver 12A. Typically the storage multipathdevice driver 12A is coupled to the applications (APP1, APP2, . . . ,APPn) via an operating system 13.

For the purposes of this invention the bus 14 may be considered to apath, and in practice there may be a plurality of paths (i.e.,multipaths) between the host 12 and the storage devices. This can beimplemented using at least one host adapter (HA) 12C coupled to at leasttwo paths and to a switching fabric 20 from which multiple paths emanateto the storage devices. The use of multiple paths between the host 12and the storage devices provides redundancy and avoids the generation ofa single point of failure (POF). Through the use of the plurality ofpaths 14 the SAN 10 may be considered to be a redundant SAN. The storagemultipath device driver 12A is assumed to have knowledge of theoperational status of the various paths 14 connecting the host 12 to thestorage devices 16. Another HA 12C can be used to couple to anotherstorage device or devices 22 either directly or via another switchfabric (not shown).

By example, there maybe ten storage device LUNs (Logical Unit Numbers),each a disk drive, and there may be eight paths 14 to each LUN.

In accordance with exemplary embodiments of this invention the problemsdiscussed above are solved by the addition of intelligence into theworkload management at the level of the storage multipath device driver12A. As opposed to using a constant depth of the I/O pending queue 12B,regardless of the workload change, the storage multipath device driver12A dynamically adjusts the amount of I/O sent to the disk device driver18A depending on the change of workload. This technique aids inbalancing the size of the storage multipath device driver I/O pendingqueue 12B, as well as the disk device driver I/O pending queue 18B,under various conditions of normal (non-error) operation and errorrecovery operation.

By the use of the exemplary embodiments of this invention the I/Othroughput is increased when demand from the application(s) isincreasing, thus avoiding performance degradation and preventing systemhanging caused by the queue depth control implemented by the storagemultipath device driver 12A. In the event of an I/O failure, the storagemultipath device driver 12A is sensitive to the change and effectively“tunes” the amount of I/O sent to the disk device driver 18A to asmaller value to prevent I/O hanging or performance degradation at thelevel of the disk device driver 18A, as it would typically retry somenumber of times for each I/O request.

At the level of the storage multipath device driver 12A multiple retriesare not performed to the same degree (if at all) as the disk devicedriver 18A for each job queued at the I/O pending queue 18B. Once aparticular path 14 receives some certain number of continuous errors itis taken offline. If all the paths 14 are taken offline, the storagemultipath device driver 12A may return all of the I/O requests on theI/O pending queue 12B to the application(s), without any retries.Therefore, the storage multipath device driver 12A does not typicallyencounter the same performance degradation during an error recoveryprocedure as the disk device driver 18A does.

By adding intelligence in the storage multipath device driver 12A itbecomes capable of dynamically sensing and responding to a changing I/Ovolume, and to an occurrence of I/O errors, so as to efficiently handleboth normal or good conditions and error conditions. The non-limitingembodiments of this invention can be practiced with any storagemultipath device driver on any platform through the use of the QueueDepth of the I/O queue 12B to dynamically control I/O flow.

For the implementation of the exemplary embodiments of this invention aset of rules is established to control the I/O flow in order to avoidperformance degradation and/or system hanging during stress I/O and/orerror recovery. The set of rules are established in consideration of atleast the following elements:

(a) a Queue Depth value recommended by the storage device controller 19(e.g., the controller of the disk drive(s) 16);

(b) a Length of the storage multipath device driver I/O pending queue12B (where all unprocessed I/O requests are queued) when the storagemultipath device driver 12A begins to experience a performancedegradation during a stress I/O (high volume) condition;

(c) a Length of the disk device driver I/O pending queue 18B when diskdevice driver 18A begins to experience a performance degradation duringerror recovery; and

(d) a Factor of the Queue Depth used by the storage multipath devicedriver 12A to control the I/O flow to disk device driver 18A when aperformance degradation begins to be experienced during error recovery.

After determining the above elements, the storage multipath devicedriver 12A uses these elements to implement the following logic.Reference is also made to the logic flow diagram of FIG. 2.

During a normal condition, at Block A the storage multipath devicedriver 12A calculates an amount of I/O sent to the disk device driver18A using the following formula:Total amount of I/O sent to disk device driver 18A on a device=QueueDepth×Global Factor for Queue Depth×Total Number of Functioning Paths;where the Global Factor for Queue Depth=1. Note that the Global Factorfor Queue Depth is preferably a factor of the Queue Depth value used byall of the multipath storage devices 16.

The “normal condition” may be considered to be one where there isabsence of stress I/O and/or an error condition. As employed herein“stress I/O” may be considered to be an amount of application initiatedstorage device activity that exceeds a normal amount of activity by somepredetermined amount. The predetermined amount may be fixed, or it maybe variable depending on system conditions. For example, on an AIX™operating system 13 equipped host 12 system (AIX™ is an open operatingsystem, based on UNIX™, that is available from the assignee of thispatent application), the following table defines a suitable rule forthis implementation (n is the number of I/O requests): Global Factor forQueue Depth Length of Pending Queue (for all storage devices) n >= 12003 800 < n < 1200 2 n < 800 1

For example, if the number of I/O requests on the I/O pending queue 12Bis 1100, the calculation performed by the storage multipath devicedriver 12A using the formula recited above uses a value of 2 for theGlobal Factor for Queue Depth.

In the stress I/O environment, and at Block B, the storage multipathdevice driver 12A monitors the length of the I/O pending queue 12B andadjusts the Global Factor for Queue Depth value accordingly to allowmore I/O to be sent to the disk device driver 18A.

In the error condition, at Block C, the storage multipath device driver12A monitors the number of functioning paths and adjusts the individualfactor for Queue Depth correspondingly to reduce the amount of I/O sentto the disk device driver 18A. If a percentage of functioning paths of amultipath device (referred to herein as m) is reduced to less than 100%,the storage multipath device driver 12A switches from using the globalfactor for normal or stress I/O conditions to an individual factor ofthis disk 16 for controlling the queue depth during the error condition.

For example, and assuming again the non-limiting case of an AIX™operating system 13 installed on the host 12, the following illustratesa suitable rule for use in the implementation: if 50%<m<100%, then theIndividual Factor for Queue Depth (per multipath device), denoted as(f), is given by:If Global Factor for Queue Depth>=2, f=Global Factor for Queue Depth−1;elseif m<=50%, then f=1.

For example, assume in an exemplary case that the total number of pathsto the storage devices 16 is eight, that the number of functioning pathsis six, and that the global factor for queue depth is three. In thiscase the condition of 50%<m<100% is satisfied. Therefore, the followingcalculation is performed by the storage multipath device driver 12A:Total amount of I/O sent to disk device driver 18A=Queue Depth×(GlobalFactor for Queue Depth−1)×Total number of functioning paths.

Based on the foregoing discussion it should be appreciated that byimplementing the described methods in the storage multipath devicedriver 12A, the storage multipath device driver 12A is enabled todynamically adjust the I/O flow to the disk device driver 18A based onthe workload and the presence or absence of I/O errors to avoidperformance degradation or system hanging in stress I/O and in errorconditions.

A feature of the exemplary embodiments of this invention is that thebandwidth between the host 12 and the storage devices 16 can be adjustedcorresponding to I/O conditions to avoid the generation of a bottleneckat either the storage multipath device driver 12A or the disk devicedriver 18A.

A further feature of the exemplary embodiments of this invention is thatoverloading of the storage multipath device driver 12A is avoided duringnon-error conditions (including during stress I/O conditions), and theoverloading of the disk device driver 18A is avoided during errorconditions.

The foregoing description has provided by way of exemplary andnon-limiting examples a full and informative description of theembodiments of this invention. However, various modifications andadaptations may become apparent to those skilled in the relevant arts inview of the foregoing description, when read in conjunction with theaccompanying drawings and the appended claims. For example, while thestorage devices have been described as the disk drives 16 coupled to thedisk device driver 18A, in other embodiments other types of storagedevices may be used, such as tape storage devices and semiconductormemory-based storage devices. The DDD 18A may thus be referred to moregenerally as a storage device driver, and the associated I/O queue 18Bas a storage device driver I/O pending queue. Further, the disk drives15 maybe based on magnetic technology, or on optical technology, and mayuse fixed or removable storage medium. Still further, it can beappreciated that the SMDD 12A may be responsive to a plurality ofdifferent error conditions, such as errors arising in one or more of thedisk drives 16, disk drive controller 19, the switch fabric 20 and/orthe HA 12C. Further in this regard the error condition processingperformed by the SMDD 12A may be tailored, if desired, in accordancewith the source of the error and may thus be adaptive in nature.However, all such modifications of the teachings of this invention willstill fall within the scope of the embodiments of this invention.

Furthermore, some of the features of the embodiments of this inventionmay be used to advantage without the corresponding use of otherfeatures. As such, the foregoing description should be considered asmerely illustrative of the principles, teachings and embodiments of thisinvention, and not in limitation thereof.

1. A computer program product comprising a computer useable mediumincluding a computer readable program, wherein the computer readableprogram when executed on the computer causes the computer to operatewith at least one storage device in a redundant storage network byoperations comprising: monitoring operation of the redundant storagenetwork; and controlling a storage multipath device driver inconjunction with an associated storage multipath device input/output(I/O) pending queue to increase I/O throughput to a storage devicedriver when I/O demand increases, and to decrease I/O throughput to thestorage device driver in the event of an I/O error condition.
 2. Thecomputer program product as in claim 1 where, during a normal I/Ooperating condition, controlling the storage multipath device drivercomprises calculating an amount of I/O sent to the storage device driverusing:total amount of I/O sent to storage device driver=queue depth×globalfactor for queue depth×total number of functioning paths, where globalfactor for queue depth=1.
 3. The computer program product as in claim 2where, during a stress I/O operating condition, controlling the storagemultipath device driver comprises monitoring a length of the I/O pendingqueue and adjusting the global factor for queue depth value accordinglyto allow more I/O to be sent to the storage device driver.
 4. Thecomputer program product as in claim 2 where, during an error condition,controlling the storage multipath device driver comprises monitoring anumber of functioning paths and adjusting an individual factor for queuedepth correspondingly to reduce the amount of I/O sent to the storagedevice driver, where if a percentage of functioning paths is reduced toless than 100%, the storage multipath device driver switches from usingthe global factor for queue depth value to an individual factor of astorage device for controlling I/O pending queue depth.
 5. The computerprogram product as in claim 1, where the at least one storage devicecomprises a disk storage device.
 6. A system comprising a redundantstorage network that includes a host coupled via a plurality of paths toat least one storage device, said host comprising a storage multipathdevice driver coupled with a storage multipath device input/output (I/O)pending queue that is coupled to a storage device driver comprised of astorage device driver I/O queue, said storage multipath device driveroperable for monitoring conditions of the redundant storage network toincrease I/O throughput to the storage device driver when I/O demandincreases, and to decrease I/O throughput to the storage device driverin the event of an I/O error condition.
 7. The system as in claim 6where, during a normal I/O operating condition, the storage multipathdevice driver calculates an amount of I/O sent to the storage devicedriver using:total amount of I/O sent to storage device driver=queue depth×globalfactor for queue depth×total number of functioning paths, where globalfactor for queue depth=1.
 8. The system as in claim 7 where, during astress I/O operating condition, the storage multipath device drivermonitors a length of the I/O pending queue and adjusts the global factorfor queue depth value accordingly to allow more I/O to be sent to thestorage device driver.
 9. The system as in claim 7 where, during anerror condition, the storage multipath device driver monitors the numberof functioning paths and adjusts an individual factor for queue depthcorrespondingly to reduce the amount of I/O sent to the storage devicedriver, where if a percentage of functioning paths is reduced to lessthan 100%, the storage multipath device driver switches from using theglobal factor for queue depth value to an individual factor of a storagedevice for controlling I/O pending queue depth.
 10. The system as inclaim 6, where the at least one storage device comprises a disk storagedevice.
 11. A host comprising a multipath interface for coupling via aplurality of paths to at least one storage device, said host comprisinga storage multipath device driver comprising a storage multipath deviceinput/output (I/O) pending queue, said storage multipath device drivercoupled to a storage device driver that comprises a storage devicedriver I/O queue, said storage multipath device driver operable toincrease I/O throughput to the storage device driver when I/O demandincreases, and to decrease I/O throughput to the storage device driverin the event of an I/O error condition.
 12. The host as in claim 11where, during a normal I/O operating condition, the storage multipathdevice driver calculates an amount of I/O sent to the storage devicedriver using:total amount of I/O sent to storage device driver=queue depth×globalfactor for queue depth×total number of functioning paths, where globalfactor for queue depth=1.
 13. The host as in claim 12 where, during astress I/O operating condition, the storage multipath device drivermonitors a length of the I/O pending queue and adjusts the global factorfor queue depth value accordingly to allow more I/O to be sent to thestorage device driver.
 14. The host as in claim 12 where, during anerror condition, the storage multipath device driver monitors the numberof functioning paths and adjusts an individual factor for queue depthcorrespondingly to reduce the amount of I/O sent to the storage devicedriver, where if a percentage of functioning paths is reduced to lessthan 100%, the storage multipath device driver switches from using theglobal factor for queue depth value to an individual factor of a storagedevice for controlling I/O pending queue depth.
 15. The host as in claim11, where the at least one storage device comprises a disk storagedevice.
 16. A method to operate a host with at least one storage devicein a redundant storage network, comprising: monitoring operation of theredundant storage network; and operating a storage multipath devicedriver in conjunction with an associated storage multipath deviceinput/output (I/O) pending queue to increase I/O throughput to a storagedevice driver when I/O demand increases, and to decrease I/O throughputto the storage device driver in the event of an I/O error condition. 17.The method as in claim 16 where, during a normal I/O operatingcondition, operating the storage multipath device driver comprisescalculating an amount of I/O sent to the storage device driver using:total amount of I/O sent to storage device driver=queue depth×globalfactor for queue depth×total number of functioning paths, where globalfactor for queue depth=1.
 18. The method as in claim 17 where, during astress I/O operating condition, operating the storage multipath devicedriver comprises monitoring a length of the I/O pending queue andadjusting the global factor for queue depth value accordingly to allowmore I/O to be sent to the storage device driver.
 19. The method as inclaim 17 where, during an error condition, operating the storagemultipath device driver comprises monitoring the number of functioningpaths and adjusting an individual factor for queue depth correspondinglyto reduce the amount of I/O sent to the storage device driver, where ifa percentage of functioning paths is reduced to less than 100%, thestorage multipath device driver switches from using the global factorfor queue depth value to an individual factor of a storage device forcontrolling I/O pending queue depth.
 20. The method as in claim 16,where the at least one storage device comprises a disk storage device.